首页> 外文OA文献 >Fast Discrete Distribution Clustering Using Wasserstein Barycenter with Sparse Support
【2h】

Fast Discrete Distribution Clustering Using Wasserstein Barycenter with Sparse Support

机译:基于Wasserstein重心的快速离散分布聚类   稀疏支持

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In a variety of research areas, the weighted bag of vectors and the histogramare widely used descriptors for complex objects. Both can be expressed asdiscrete distributions. D2-clustering pursues the minimum total within-clustervariation for a set of discrete distributions subject to theKantorovich-Wasserstein metric. D2-clustering has a severe scalability issue,the bottleneck being the computation of a centroid distribution, calledWasserstein barycenter, that minimizes its sum of squared distances to thecluster members. In this paper, we develop a modified Bregman ADMM approach forcomputing the approximate discrete Wasserstein barycenter of large clusters. Inthe case when the support points of the barycenters are unknown and have lowcardinality, our method achieves high accuracy empirically at a much reducedcomputational cost. The strengths and weaknesses of our method and itsalternatives are examined through experiments, and we recommend scenarios fortheir respective usage. Moreover, we develop both serial and parallelizedversions of the algorithm. By experimenting with large-scale data, wedemonstrate the computational efficiency of the new methods and investigatetheir convergence properties and numerical stability. The clustering resultsobtained on several datasets in different domains are highly competitive incomparison with some widely used methods in the corresponding areas.
机译:在各种研究领域中,向量的加权包和直方图被广泛用于复杂对象的描述符。两者都可以表示为离散分布。对于遵循Kantorovich-Wasserstein指标的一组离散分布,D2聚类追求最小的总聚类内变化。 D2群集存在严重的可伸缩性问题,瓶颈在于计算质心分布(称为Wasserstein重心)的方法,该方法最大程度地减少了到群集成员的平方距离之和。在本文中,我们开发了一种改进的Bregman ADMM方法,用于计算大型星团的近似离散Wasserstein重心。在重心的支撑点未知且基数较低的情况下,我们的方法凭经验以较低的计算成本实现了高精度。通过实验检查了我们方法及其替代方法的优缺点,并为各自的用法提供了推荐方案。此外,我们开发了该算法的串行和并行版本。通过试验大规模数据,证明了新方法的计算效率,并研究了它们的收敛性和数值稳定性。在不同领域的几个数据集上获得的聚类结果与相应领域中一些广泛使用的方法相比具有极强的竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号